ocai 01 in i£ i-r-vo 



PATENT ABSTRACTS OF JAPAN 







(1 DPublication number : 


07-191920 






(43)Date of publication of application : 28.07.1995 


(51)Int.CI. 


••- 


G06F 13/00 
G06F 13/00 
G06F 15/16 
H04L 12/28 




(21 Application number : 
(22)Date of filing : 


05-331908 
27.12.1993 


(71) Applicant : 

(72) Inventor : 


TOSHIBA CORP 
IZUMI TAIICHIRO 



(54) COMPUTER NETWORK 

(57)Abstract: 

PURPOSE: To quickly exclude or recover a fault of each computer on a communication 
network. 

CONSTITUTION: This computer network is constituted by connecting plural computers 
2 and a fault management computers 3 to a communication network 1, and the fault 
management computer 3 has a fault management program 4 which performs collection 
of fault information, judgement of the fault occurrence condition, instruction of 
exclusion and recovery of the fault, etc., and each computer 2 has a fault management 
program 9 which receives fault information from a program 12 and collects it to 
generate fault report data and transmits this data to the fault management computer 3. 
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(54) [Title of the Invention] Computer network 

[Object] To speedily eliminate and recover a fault of each 

computer over a communication network, 

[Arrangement] This computer network connects a plurality 

of computers 2 and a fault management computer 3 to a 
communication network 1 . The fault management computer 3 has 
a fault management program 4 for acquisition of fault 
information, determination of a fault occurrence state, 
elimination of a fault, and supply a recovery instruction. 
Each computer 2 has a fault management program 9 for receiving 
fault information from a program 12, producing fault 
notification data by collecting such fault information, and 
transmitting the data to the fault management computer 3. 

[0015] Fig. 1 is a view showing a configuration of a 

computer network according to one embodiment of the present 
invention. 

[0016] In Fig. 1, reference numeral 1 denotes a 

communication network. Reference numeral 2 denotes a 
plurality of computers connected to the communication network 
1, each of which executes a program installed in its own. 
Reference numeral 3 denotes a fault management computer 
connected to the communication network 1 , the computer 
periodically monitoring each computer 2 over the communication 
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network 1, and managing the entire network. Reference numeral 
4 denotes a fault management program installed in the fault 
management computer 3, the program performing acquisition of 
fault information, determination of a fault occurrence state, 
and fault elimination and recovery or the like. Reference 
numeral 5 denotes a fault determination portion in the fault 
management program, the fault determination portion 
performing determination of a fault occurrence state. 
Reference numeral 6 denotes fault information that is, for 
example, a fault occurrence state data determined by the fault 
determination portion 5 or data on fault itself. Reference 
numeral 7 denotes a storage device having fault information 
stored and saved therein. Reference numeral 8 denotes a 
database that has a correlation table having registered therein 
processing contents to be executed by the fault management 
program 4 according to a fault occurrence state. Reference 
numeral 9 denotes a fault management program installed in one 
computer of the above plurality of computers, which monitors 
a working state of the program in its own computer, that is, 
receives fault information from the program, reads general 
output information from the program, produces fault 
notification data by collecting the read output information, 
and transmits the data to the fault management computer. In 
addition, this fault management program 9 receives a fault 
elimination instruction or recovery instruction from the fault 
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management computer, and, for example, performs forced end of 
the program, program initiation of the stopped program or the 
like according to the instruction. Reference numeral 10 
denotes a fault information production portion in the fault 
management program, this portion producing the fault 
notification information. Reference numeral 11 denotes a 
fault elimination and recovery processing portion in the fault 
management program, this portion performing elimination and 
recovery of faults of the corresponding program based on the 
instruction contents. Reference numeral 12 denotes a program 
installed in a computer, wherein, if a fault occurs with its 
own, fault information is actively detected by itself, and is 
outputted to the fault management program 9 . 
[0017] Hereinafter, an operation of this computer network 

will be described with reference to Figs. 2 to 4 . Fig. 2 is 
a view showing a configuration of fault notification data 
produced by each computer. Fig. 3 is a view showing one example 
of a fault information character string in the configuration 
of the fault notification data in Fig. 2. Fig. 4 is a view 
showing contents of the correlation table in the database. 
[0018] In the case of this computer network, the fault 

management computer 3 monitors each computer 2 over the 
communication network 1. In addition, the fault management 
computer 9 of the computer 2 (non -management computer) waits 
until any information is inputted from the program 12 in its 
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own computer. 

[0019] Here, from the program 12, for example, there is 

outputted information such as fault information indicating 
that processing having been executed by the program 9 stops 
for any reason. When the waiting fault management program 9 
receives the information, the fault management program 9 
produces fault notification data composed of resource names 
on which the program itself operates, namely, a computer name 
21, its own program name 22 with which a fault has occurred, 
and a fault information character string (arbitrary character 
string) 23 or the like, as shown in Fig. 2, based on such fault 
information. 

[0020] The fault information character string of the 

fault notification data is provided as an arbitrary character 
string having described therein the contents generated with 
the non -management computer 2 . As shown in Fig . 3 , for example , 
if program processing has stopped, "stop" is described. If 
a program has ended, "end" is described. If a fault is 
indicated in the forms of message, "fault: XXXX has occurred" 
is described. If an initiation request is provided to another 
program, "initiate program X" is described. If a fault occurs 
with a magnetic disk device connected to the computer 2, "XXX 
has occurred with disk XXXX" is described. The character 
strings corresponding to the information contents are 
described. 
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[0021] The fault management program 9 of the computer 2 

having produced the above fault notification data delivers 
fault notification data to the communication network 1. 
[0022] On the other hand, the fault monitoring computer 

3 monitors each computer 2 over the communication network, and 
fault notification data is received to the fault management 
program 4 through the communication network 1. 
[0023] Then, the fault determination portion 5 of the 

fault management program 4 stores the received fault 
notification data in the storage device 7 , and analyzes the 
contents thereof . 

[0024] The fault determination portion 5 first samples 

the fault information character string 23, namely, "stop" from 
fault notification data in the case where the contents of fault 
notification data are analyzed. As shown in Fig. 4, the 
sampled character string is correlated with the contents of 
the correlation table 41 of the database 8, and a command for 
executing an optimal operation is acquired. In this 
correlation table 41, there are registered operation 
instruction commands (transmission of signal X, initiation of 
next program, none, initiation of specified program or none) 
or the like that correspond to character springs such as 
permissible fault information ( "stop" , "end", "fault : XXXX has 
occurred", "initiate program X", and "XXX has occurred with 
disk XXXX") . As in this embodiment, for example, a character 



string "end" is sampled, a command name "transmission of signal 
X" is acquired by this correlation table 41. 
[0025] Then, the fault determination portion 5 refers to 

a computer name and a program name from fault notification data, 
and executes processing of the acquired command, namely, 
processing for transmitting signal X that is a signal for 
recovering processing stop of the program 12 with respect to 
the destination. In the case where "none" is acquired at the 
above correlation table 41, the fault determination portion 
5 does not execute any processing for the computer 2. 
[0026] When signal X is transmitted through the 

communication network 1, the fault management program 9 of the 
computer 2 that is a destination received the signal. Then, 
the fault elimination and recovery processing portion executes 
fault elimination and recovery processing, namely, processing 
for supplying the signal X to the program 12, and then, 
initiating the program again. 

[0027] In this way, according to the computer network of 

the present embodiment, each computer 2 actively notifies its 
own fault occurrence contents to the fault management computer 
3 connected to the communication network 1. Thus, the fault 
management computer 3 to which a fault has been notified 
instructs an optimal operation immediately based on its 
notification contents, for example, initiates again the 
program 12 of the computer 2 whose processing has stopped, and 
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can eliminate and recover faults speedily. 
[0028] In addition, fault notification data having 

received from the computer 2 is saved in the storage device 
7. Thus, an operator calls the stored data, thereby making 
it possible to investigate the cause of an occurrence of a fault . 
Now, another embodiment will be described with reference to 
Fig. 5. 

[0029] Although the above embodiment has illustrated a 

case of the active program 12 that detects an error by itself 
as a program of the computer 2, in the case of a program 51 
in which, for example, its working state is monitored by the 
fault management program 9, the fault management program 9 
periodically monitors a program 51 within a very short time. 
[0030] In this case, the fault management program 9 

performs rough monitoring to an extent such that it is monitored 
whether or not the program 51 executes processing ( stop or end) , 
whereby timely correspondence can be taken for such two faults . 
[0031] In the case of the conventional program 52 that 

does not know that the fault management computer 3 periodically 
monitors each computer 2 over a network, in general, 
information on faults that have occurred in a process of 
executing the program 52 is stored in data storage means 53 
in any form. Thus, the fault management program 9 can read 
out fault information by periodically accessing the data 
storage means 53 within a very short time in the same manner 
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as above. Thus, although an operator must cope with faults, 
a timely measure can be taken for the faults in the same manner 
as above . 
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